Efficiency, Robustness and Accuracy in Picky Chart Parsing

نویسندگان

  • David M. Magerman
  • Carl Weir
چکیده

T h i s p a p e r d e s c r i b e s P i c k y , a p r o b a b i l i s t i c a g e n d a b a s e d c h a r t p a r s i n g a l g o r i t h m w h i c h u s e s a t e c h n i q u e ca l led p~'obabilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. Using a suboptimal search method, "Picky significantly reduces the number of edges produced by CKY-like chart parsing algorithms, while maintaining the robustness of pure bottom-up parsers and the accuracy of existing probabilistic parsers. Experiments using Picky demonstrate how probabilistic modelling can impact upon the efficiency, robustness and accuracy of a parser. 1. I n t r o d u c t i o n This paper addresses the question: Why should we use probabilistic models in natural language understanding? There are many answers to this question, only a few of which are regularly addressed in the literature. The first and most common answer concerns ambigu~ ity resolution. A probabilistic model provides a clearly defined preference nile for selecting among grammatical alternatives (i.e. the highest probability interpretation is selected). However, this use of probabilistic models assumes that we already have efficient methods for generating the alternatives in the first place. While we have O(n 3) algorithms for determining the grammaticality of a sentence, parsing, as a component of a natural language understanding tool, involves more than simply determining all of the grammatical interpretations of an input. Ill order for a natural language system to process input efficiently and robustly, it must process all intelligible sentences, grammatical or not, while not significantly reducing the system's efficiency. This observ~ttiou suggests two other answers to the central question of this paper. Probabilistic models offer a convenient scoring method for partial interpretations in a well-formed substring table. High probability constituents in the parser's chart call be used to interpret ungrammat.ical sentences. Probabilistic models can also *Special I.hanks to Jerry Hobbs and F3ob Moo*re at S[II for p rov id ing access to their colllptllel's, and to Salim ]-/oukos, Pel:er B rown , and Vincen t and Steven Della Piel.ra ,-xt IF3M for their inst . ruct ive lessons on probabi | ist i , : model l ing of na tu ra l I:mguage. be used for efficiency by providing a best-first search heuristic to order the parsing agenda. This paper proposes an agenda-based probabilistic chart parsing algorithm which is both robust and efficient. The algorithm, 7)icky 1, is considered robust because it will potentially generate all constituents produced by a pure bot tom-up parser and rank these constituents by likelihood. The efficiency of the algorithm is achieved through a technique called probabilistic prediction, which helps the algorithm avoid worst-case behavior. Probabilistic prediction is a trainable technique for modelling where edges are likely to occur in the chart-parsing process. 2 Once the predicted edges are added to the chart using probabilistic prediction, they are processed in a style similar to agenda-based chart parsing algorithms. By limiting the edges in the chart to those which are predicted by this model, the parser can process a sentence while generating only the most likely constituents given the input. In this paper, we will present the "Picky parsing algorithm, describing both the original features of the parser and those adapted from previous work. Then, we will compare the implementation of ̀ picky with existing probabilistic and non-probabilistic parsers. Finally, we will report the results of experiments exploring how `picky's algorithm copes with the tradeoffs of efficiency, robustness, and accuracy. 3 2. P r o b a b i l i s t i c M o d e l s i n " P i c k y The probabilistic models used ill the implementation of "Picky are independent of the algorithm. To facilita.te the comparison between the performance of "Picky and its predecessor, "Pearl, the probabilistic model ilnplelnented for "Picky is similar to "Pearl's scoring nlodel, the contextl 'pear l =-probabi l is t ic Ear ley-s tyle pa r se r (~ -Ea r l ) . "Picky =probabil is t ic CI(Y-like parser ( 'P-CKY). 2Some famil iar i ty with cha r t pa rs ing t e rmino logy is a s s u m e d in this paper . For terminological def ini t ions, see [9], [t0l, [11], or [17]. 3Sect ions 2 and 3, the descr ipt ions of the probabi l is t ie mode l s used in ",Picky and the T'icky a lgor i thn , , are s imi la r in con ten t to the cor respond ing sect ions of M a g e r n m n and Weir[13]. Th e exper imenta l resu l t s and d iscuss ions which follow in sec t ions .1-6 ~tre original .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eeciency, Robustness and Accuracy in Picky Chart Parsing

This paper describes Picky, a probabilistic agenda-based chart parsing algorithm which uses a technique called prob-abilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. Using a subopti-mal search method, Picky signiicantly reduces the number of edges produced by CKY-like chart parsing algorithms, while maintaining the robustness of pure bot...

متن کامل

Probabilistic Prediction and Picky Chart Parsing

This paper describes Picky, a probabilistic agenda-based chart parsing algorithm which uses a technique called probabilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. In tests on randomly selected test data, "Picky generates fewer edges on average than other CKY-like algorithms, while achieving 89~ first parse accuracy and also enabling th...

متن کامل

1 Are Efficient Natural Language Parsers Robust ?

This paper discusses the robustness of four efficient syntactic error-correcting parsing algorithms that are based on chart parsing with a context-free grammar. In this context, by robust we mean able to correct detectable syntactic errors. We implemented four versions of a bottom-up error-correcting chart parser: a basic bottom-up chart parser, and chart parsers employing selectivity, top-down...

متن کامل

Improving the Efficiency of a Wide-Coverage CCG Parser

The C&C CCG parser is a highly efficient linguistically motivated parser. The efficiency is achieved using a tightly-integrated supertagger, which assigns CCG lexical categories to words in a sentence. The integration allows the parser to request more categories if it cannot find a spanning analysis. We present several enhancements to the CKY chart parsing algorithm used by the parser. The firs...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992